feat: [document-compression-updater]- Improvements: Auto-remove dummy compression field, drop tracker collection and more#182
Open
NestorRV wants to merge 1 commit intoawslabs:masterfrom
Conversation
… compression field, drop tracker collection and more - Add connection retry logic and startup validation - Fix UnboundLocalError on empty collection in setup() - Replace bare print() calls with printLog() throughout - Add per-batch progress bar, elapsed time, rate, and ETA logging - Auto-remove dummy compression field after each batch via $unset - Add --skip-cleanup and --append-log flags - Drop tracker collection on successful completion - Remove dead multiprocessing queue code and unused imports
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
UnboundLocalErroron empty collection insetup()print()calls withprintLog()throughout$unset--skip-cleanupand--append-logflagsChanges
Error Handling
get_mongo_client()helper with retry logic (up to 3 attempts, 5s delay) used everywhere a connection is neededvalidate_connection()called at startup to verify the URI is reachable and the target database/collection exist before any work beginsUnboundLocalErrorcrash insetup()when the collection is emptysetup()andtask_worker()withtry/except, with reconnect logic in the worker's batch loopCode Quality
threading,stringtask_worker()--append-logflag — log file is no longer silently deleted on every startup unless the flag is omittedObservability
print()calls withprintLog()for consistent output to both stdout and the log fileCorrectness
bulk_writewith$unset--skip-cleanupflag for cases where removing the field is not requiredcleanupCompletebooleanREADME
--append-log,--skip-cleanup🤖 Generated with Claude Code